Assembly for Beginners Have you ever had a look at C, C++, or Pascal? Did you feel dizzy after reading the first 10 comands? Well “You ain’t seen nothing yet”! Assembly, being only one level away from the user friendliness of programing in binary, is in some respect the most difficult way of programing your Mac. But it is also the most efficient way of doing it as well. Whatever is written in assembly is for sure faster than something written in C, or Pascal or, God forbid, Applescript! Another benefit of assembly, in my point of view, is that you only need to learn a set of about 60 (if you’re using a 68k processor you need to know 56 basic instructions) commands (out of which you use maybe 30) instead of the bunch of commands used in C. Since anything you write in assembly language is direcly involved with the RAM, Bus, and other hardware components of your Mac, you will first need to have an understanding of the different components of your computer. This file will deal ONLY with the 68000 chip, and how to program a computer using that chip. However, since the PPCs can simulate a 68k chip there should be no problem with writing programs for a 68k Mac and using it on a PPC. Also programing a PPC chip is a bit more involved, if you are a beginner then it’s a good idea to understand the basics of programing a 68k Mac before attacking a PPC platform. (Notice how I used the word “Mac” to refer to a computer... I just realized that in a world of clones that does not politically corect, so please excuse my ignorance!) --==< Basic Terminology >==-- - The Processor - This is where all the commands you create are sent and processed. - The Bus - Where the information travels between the processor and the memory. - The Memory - A place where data is stored. Think of it like a bunch of boxes with their own addresses where the procesor can store data, or retrieve data from. - Registers - Places within the processor where the information is stored. There are two types of registers: Data registers and Address registers. Data registers are used to hold information, such as numbers. Address registers are used to point to an area in the memory where a piece of data can be found. In my simple head (a professional progamer would probably skin me alive for saying this) data registers are used to hold data that will be processed right now, while the address registers point to places in the memory where data is stored for future use. For example, if I do a simple mathematical calculation, I would do it in the data register and then store the answer in the memory using the address registers. There are 8 data registers and 8 address registers, numbered d0-d7 and a0-a7 (a7 is reserved as the stack pointer). So whenever I would want to put a number to a data register I would choose between d0-d7. The purpose of registers and their uses will become a lot clearer when we actually start programing! - Stack Pointer - A part in the memory where data can be stored temporarily. As a matter of fact this is address register 7 (more about this later). The way it works is that you you take data and push it on the stack. Then you take another piece of data and push it on top of the other one. Then at any time you can remove these pieces of data. Think about it as stacking paper sheets on top of each other. If you put three sheets of paper on top of eachother in order to see what’s on the sheet you put down first you have to remove the first two. - Addressing - Getting information from the registers. - Program Counter - A special address register that keeps track of where in the program the processor is. - Number Systems - If you don’t know what this is, you should think twice about reading on! I’m sorry but I won’t go into what the different number systems are. Ask your math teacher! I’ll just let you know that in assembly language it’s an extreme need that you have a way of converting between decimal and hexadecimal numbers! - Bytes, Words, and Longs - Refer to the length of a number. In assembly, a byte refers to a number the length of two hex digits, a word refers to a number the length of 4 hex digits and a long is the length of 8 hex digits. For example, a byte may be AF, a word ABCD, and a long would be 12345678. In practical terms except for a few exceptions the only two uses for bytes words and longs is speed and number manipulation. For what is the use of moving 00000001 around when you can just move 01 around? And if you have 12345678 in a data register, you can move the last two digits around by just moving the byte. --====-- Ok here we go...A smart thing now would be to give you a list of commands that the 68k processor uses. However, since that was already published in the last issue of HackAddict, I’m gonna be a lamer and not give you that! (I guess you’ll just have to download it) But I will explain the use of some of the main commands. Well let’s start with maybe the most simple example, adding two numbers together. The way you would do it is by pushing the first number into one of the data registers, push the second number into a different data register and add the two data registers together. Let’s say we were adding 1FE and 2C together, in assembly you would do it like this: move.l 1FE,d0 *moves number 1FE into data register 0 move.l 2C,d1 *moves number 2C into data register 1 add.l d1,d0 *adds the number in data register 1 to that in *data register 0 This is an easy example and I don’t think there should be any problem following it. Notice how I used longs. In reality when you would put the long 1FE onto d0, you actually push the number 000001FE, onto d0. You could do the above procedure like this as well: move.w 1FE,d0 move.b 2C,d1 add.w d1,d0 The result would be exactly the same however, there are a few things to point out here. Notice how in the first line I’m moving a word and how in the second line I’m moving a byte. Remembering that a word can hold up to 4 digits and a byte can hold only two this should be quite understandable. Now the last line takes the word in d1 and adds it to the word in d0. After it stores the new value in d0. Now try to figure out what this does: move.w 1FE,d0 move.b 2C,d1 add.b d0,d1 Here we changed only the last line. Notice how now only the byte stored in d0, will be added to d1. This means that only FE will be added to 2C, because a byte can only hold two digits and therefore only the last two digits were added. Can you spot the potentials for number manipulation? This is why it is important to keep track of which register holds what size of a number! And now for a bit more complicated one: move.l 12345678,d0 *line 1 move.l 10,d1 *line 2 loop: *line 3 move.l d0,d2 *line 4 mulu.l 10000000,d2 *line 5 divu.l 10000000,d2 *line 6 add.b d2,d3 *line 7 mulu.l 10,d3 *line 8 divu.l d1,d0 *line 9 bra loop *line 10 Believe it or not this turns the number 12345678, in d0 into 87654321. I’ll start by explaining the different instructions. In line 3 I use the expression “loop:”. What this does is that it creates a subroutine named “loop”. Subroutines are what assembly is all about. These are a set of instructions that do a certain thing, and whenever you need that thing done you tell the computer to execute that and then return to where it left off. You would have a main loop, along with a bunch of subroutines, and whenever the main loop would need to do something it would go to a certain subroutine. For example a simplified word processor, would have a main loop in which it will wait for a key to be pressed. Once that happens a subroutine would be called in which the key is printed on the screen, and after that’s done the processor would go back to executing the main loop. Another importance of subroutines is that you can use conditionals to branch to them. For example when you enter a password, the program would check the password, and if it is correct it would “branch” to a certain subroutine. If it is not correct it would branch to another subroutine. So in our example the subroutine is called “loop”, and by putting the colon beind it, we tell the decompiler that it’s a subroutine. The last line, means branch always to the subroutine “loop”. This is sort of similar to the “goto” command in BASIC. Once the processor reaches this command it will find the subroutine and execute whatever is in it. The instructions “divu” and “mulu” mean divide and multiply using unsigned arithmatic. I’m not going to get into what signed and unsigned bits are because that is not for beginners. Let us now follow the code through: First two lines should be preatty clear. Line three tells the processor that a subroutine called “loop” will begin. Line 4 moves the HEX number 12345678, into data register 2. It is important that you realize that whatever is in a data register will be regarded by the processor as a hexadecimal number! Line 5 multiplies 12345678, by 10000000. What does this do? Get out your scientific calculator, or drop into MacsBug, and try it out. It should give you the number 123456780000000. Now, we must remember that a data register may only hold a long, meaning 8 digits. Therefore after line 5 is executed d2 will contain the value 80000000. And after dividing this by 10000000, d2 will contain the number 8. Cool huh? We had to do all these things just to take the last digit of d0. All the other lines should be quite understandable for you, we add 8 to d3, multiply it by 10 (giving us 80). Then d0 gets divided by d1. Why? You will see in the next loop! After all this is done, the processor reaches the BRA command and brances back to loop. What happens now is as follows, since d0 is now 1234567, by the time the processor reaches line 7 d2 will contain the number 7, then that will be added to d3, so that d3 will contain the number 87, then it gets multiplied by 10 giving us 870, and the same thing happens again. And after a couple of loops d3 will contain the number 87654321. The only problem now is that after the 8th loop, we will get a “divided by zero” error, because d0 will contain 0 and we are trying to divide it. So how do we make the subroutine repeat itself 8 times only? like this: move.l 12345678,d0 move.l 10,d1 clr.l d5 loop: move.l d0,d2 mulu.l 10000000,d2 divu.l 10000000,d2 add.b d2,d3 mulu.l 10,d3 divu.l  d1,d0 add.b 1,d5 cmpi.b 7,d5 bne loop rts Here I added 3 more lines. In line three data register 3 is cleared (set to zero). Then in line 1, d5 is increased by one. In the next line the compare command is used, and the number in d5, representing how many times the loop has been executed, is compared to 7. Why 7? Because at the first loop d5 had the value of 0. The last line here means return from subroutine, and here it means the end of our program. If this code however would only have been a subroutine of a big program, then once rts would be reached, the next command would be the one that is after the command which told the processor to do the subroutine. --====-- Well now you know how assembly works. But if you don’t know how the Mac OS works it is sort of difficult writing a program for it (unless you want to rewrite several parts of the OS). In reality when you write something in assembly language, you let the Mac OS do most of the work for you. For example when you want a window to appear, you would tell the OS a couple of things, like size and the title in the window, and then you let the OS do the rest for you. For example to display an alert box (that is saved in the program’s resource fork) you would write the following: clr.w -(sp) move.w d0,-(sp) clr.l -(sp) dc.w alert move.w (sp)+,d0 Now that doesn’t look complicated, only you knew what the hell it meant! First of all sp refers to the stack pointer (address register 7, remember?). The negative and positive signs behind and after sp, means whether the sp is incremented or decremented. Wow there are two expressions for you! Remember how an address register points to an address in a memory? Let’s say that sp is pointing to the location 100 in the memory. Next the word “abcd” is put into the memory at location 100. Now remember how the stack pointer works on a principle of pushing thing on and off it? So if after “abcd” was pushed on it, and I want to put something else on it as well, I’ll have to decrement the sp, menaing decrease 100 by 4 (since I’m moving a word onto the sp). So after the decrement the sp should point to 96. And once “abcd” is taken off the stack and the sp is incremented it should point to 100 again. Well that’s the idea behind it...So the first line clears a space for a word on the stack pointer. The next line pushes the id of the alert you want to use onto the stack pointer. Then another clearing, followed by the dc.w command. This tells the processor to do the subroutine labeled “alert” in the OS. (Well this is not strictly true, as the processor would first find out the number equivalent of “alert”, that you already programmed, and only then would it branch). And once the alert is done, it return the number of the button pressed in d0. This is what is commonly refered to as an OS Trap. ProZaq